CopenhagenR
Meetup at Statistics Denmark
February 28, 2017
Business survey with the objective to estimate turnover last year (2016) by industry classes
Variables in data frame
A nice data set in the sense that besides last years turnover, there are no missing values (a bit urealistic but no need to complicate things)
load(file="data/pop.RData")
summary (pop)
id employees turnover_2015
Min. :10000001 Min. : 10.00 Min. : 0
1st Qu.:10004046 1st Qu.: 13.00 1st Qu.: 13570
Median :10008092 Median : 20.00 Median : 27127
Mean :10008092 Mean : 59.62 Mean : 133638
3rd Qu.:10012137 3rd Qu.: 38.00 3rd Qu.: 68054
Max. :10016182 Max. :19594.00 Max. :116177834
NA's :8151
region industry
North DK :1699 Manufacturing :3517
Central DK :3827 Construction :2971
Southern DK:3561 Trade and transport :6221
Capital :5275 Information and communication: 940
Zealand :1820 Financial and insurance : 113
Real estate : 222
Other business services :2198
table (pop$industry, pop$region)
North DK Central DK Southern DK Capital
Manufacturing 502 1068 958 609
Construction 363 736 708 710
Trade and transport 607 1353 1410 2144
Information and communication 59 149 86 608
Financial and insurance 7 13 14 71
Real estate 22 50 38 91
Other business services 139 458 347 1042
Zealand
Manufacturing 380
Construction 454
Trade and transport 707
Information and communication 38
Financial and insurance 8
Real estate 21
Other business services 212
We want to estimate turnover by industry in 2016
Observation: Distribution of 2015 turnover is extremely skewed to the right…
plot(ecdf(pop$turnover_2015),
main="Distribution of turnover in year t-1",
xlab="Turnover [1000dkk]",
col="blue")
plot(ecdf(log(pop$turnover_2015+0.0001)),
main="Distribution of log(turnover in year t-1)",
xlab="log(Turnover)",
col="blue")
Correlation bewteen the two numerical variables in the data
cor (pop[c("employees", "turnover_2015")], use="pairwise.complete.obs")
employees turnover_2015
employees 1.0000000 0.4388757
turnover_2015 0.4388757 1.0000000
A bit diffucult to see due to skewed distribution
plot (pop$employees, pop$turnover_2015, col="blue")
plot (log(pop$employees), log(pop$turnover_2015), col="blue")
Assumption: Turnover at time \( t \) is highly correlated with turnover at \( t-1 \), hence we use turnover in 2015 to determine needed sample size.
We characterise the distribution of turnover_2015:
(m_x <- mean (pop$turnover_2015, na.rm=TRUE))
[1] 133638.3
(s_x <- sd (pop$turnover_2015, na.rm=TRUE))
[1] 1444230
A standard error (se) of 2% equal to 95% confidence interval of
m_x * 0.02 * 1.96
[1] 5238.622
So, say we want a 95% confidence interval for the mean to be at most 134,000 +/- 5,000 then we need a sample of size
library (samplingbook)
(ss <- sample.size.mean(e=5000, S=s_x, N=nrow(pop), level=0.95))
sample.size.mean object: Sample size for mean estimate
With finite population correction: N=16182, precision e=5000 and standard deviation S=1444230
Sample size needed: 15405
We would normally estimate the total turnover rather than the mean, but if the population size is known without error (a reasonable assumption) the is only a difference of factor \( N \), so calculations for mean suffice
We confirm the silly large sample size by calculating the expected standard error by hand
n <- ss$n
N <- nrow (pop)
(se_x <- sqrt (1-n/N) * s_x / sqrt (n))
[1] 2549.764
A 95% confidence interval indeed has (half) width 5,000
1.96 * se_x
[1] 4997.538
With simple random sampling we need a very large sample because of the extremely skewed population. The solution is stratification by size.
We will normally stratify by industrial class also because of two reason: 1) effect of size is varies by industry class 2) it will be easier to estimate at the industry level too
Package options: SamplingStrata: Optimal Stratification of Sampling Frames for Multipurpose Sampling Surveys stratification: Univariate Stratification of Survey Populations
However, for now we choose a very simple solution
pop$size <- cut (pop$employees,
c(10,100,1000,1e7),
right=FALSE,
include.lowest = TRUE,
labels=c("10-99","100-999","1000+"))
The result of a stratification by two factors can be seen immediately. Note that we have two factor wtih 3 respectively 7 levels, but we have only 20 non-empty strata (no real estate companies with 1000+ employees)
table (pop$industry, pop$size)
10-99 100-999 1000+
Manufacturing 3006 483 28
Construction 2878 84 9
Trade and transport 5778 417 26
Information and communication 836 94 10
Financial and insurance 60 44 9
Real estate 206 16 0
Other business services 2033 153 12
It is very often convinient to create a stratum variable, even if many function have a formula-type argument for stratum
We name each stratum in a straight forward manner
Strata <- unique (pop[,c("industry", "size")])
Strata <- Strata[order(Strata$industry, Strata$size),]
rownames(Strata) <- NULL
Strata$stratum <- 1:dim(Strata)[1]
Strata <- Strata[c(3,1,2)]
Strata
stratum industry size
1 1 Manufacturing 10-99
2 2 Manufacturing 100-999
3 3 Manufacturing 1000+
4 4 Construction 10-99
5 5 Construction 100-999
6 6 Construction 1000+
7 7 Trade and transport 10-99
8 8 Trade and transport 100-999
9 9 Trade and transport 1000+
10 10 Information and communication 10-99
11 11 Information and communication 100-999
12 12 Information and communication 1000+
13 13 Financial and insurance 10-99
14 14 Financial and insurance 100-999
15 15 Financial and insurance 1000+
16 16 Real estate 10-99
17 17 Real estate 100-999
18 18 Other business services 10-99
19 19 Other business services 100-999
20 20 Other business services 1000+
pop <- merge (pop, Strata, by=c("industry", "size"))
head (pop)
industry size id employees turnover_2015 region stratum
1 Construction 10-99 10002103 12 33458 Zealand 4
2 Construction 10-99 10007933 75 NA Central DK 4
3 Construction 10-99 10000600 45 NA Capital 4
4 Construction 10-99 10007813 10 17194 Zealand 4
5 Construction 10-99 10002056 15 NA North DK 4
6 Construction 10-99 10004287 32 66467 Zealand 4
We now want two summary statistics on stratum level: Size of each stratum (Nh) and standard deviation of last years turnover (sh)
We calculate the size of each stratum and also the standard deviation of last years turnover
Nh <- aggregate (pop$stratum,
list(stratum=pop$stratum),
length)
names(Nh)[2] <- "Nh"
sh <- aggregate (pop$turnover_2015,
list(stratum=pop$stratum),
sd, na.rm=TRUE)
names(sh)[2] <- "sh"
Strata <- merge.data.frame(Strata, Nh, by="stratum")
Strata <- merge.data.frame(Strata, sh, by="stratum")
These steps can probably be accomplished by very few lines using clever programming…
Strata
stratum industry size Nh sh
1 1 Manufacturing 10-99 3006 220529.02
2 2 Manufacturing 100-999 483 1450695.67
3 3 Manufacturing 1000+ 28 5853371.52
4 4 Construction 10-99 2878 44177.32
5 5 Construction 100-999 84 327835.73
6 6 Construction 1000+ 9 885385.85
7 7 Trade and transport 10-99 5778 255558.52
8 8 Trade and transport 100-999 417 1858479.04
9 9 Trade and transport 1000+ 26 31653522.40
10 10 Information and communication 10-99 836 89874.69
11 11 Information and communication 100-999 94 347359.55
12 12 Information and communication 1000+ 10 2797767.20
13 13 Financial and insurance 10-99 60 221949.99
14 14 Financial and insurance 100-999 44 267252.77
15 15 Financial and insurance 1000+ 9 867862.30
16 16 Real estate 10-99 206 75445.11
17 17 Real estate 100-999 16 102247.21
18 18 Other business services 10-99 2033 72893.30
19 19 Other business services 100-999 153 376541.99
20 20 Other business services 1000+ 12 1621369.72
We now need to find an allocation, ie. how many units should be selected in each stratum (nh)
\[ n_h = n \frac{N_h s_h}{\sum _{h=1}^H (N_h s_h)} \]
n <- 5000
(allo <- stratasamp (n, Nh=Strata$Nh, Sh=Strata$sh, type="opt"))
Stratum 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Size 640 677 158 123 27 8 1426 749 795 73 32 27 13 11 8 15 2 143 56
Stratum 20
Size 19
…in many cases the optimal number exceeds the available units
Strata$nh <- allo[2,]
rm (allo)
(TooMany <- (Strata$nh > Strata$Nh))
[1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[12] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
sum (TooMany)
[1] 6
In these cases we select all (a take all stratum)
Strata$nh[TooMany] <- Strata$Nh[TooMany]
Strata$nh[!TooMany] <- NA
And the remaining sample is allocated using the same function
n_rem <- n - sum(Strata$nh, na.rm=TRUE)
s <- Strata [!TooMany, ]
(allo <- stratasamp (n_rem, Nh=s$Nh, Sh=s$sh))
Stratum 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Size 795 762 22 2 1529 221 25 16 12 2 55 4 538 40
Now things are ok
s$nh <- allo[2,]
sum (s$nh > s$Nh)
[1] 0
And we have an allocation
Strata <- rbind (Strata[TooMany,], s)
sum (Strata$nh)
[1] 4999
Tidy up a bit a calculate sampling fraction
Strata <- Strata[order(Strata$stratum),]
Strata$f <- Strata$nh / Strata$Nh
stratum industry size Nh nh f
1 1 Manufacturing 10-99 3006 795 0.2644711
2 2 Manufacturing 100-999 483 483 1.0000000
3 3 Manufacturing 1000+ 28 28 1.0000000
4 4 Construction 10-99 2878 762 0.2647672
5 5 Construction 100-999 84 22 0.2619048
6 6 Construction 1000+ 9 2 0.2222222
7 7 Trade and transport 10-99 5778 1529 0.2646244
8 8 Trade and transport 100-999 417 417 1.0000000
9 9 Trade and transport 1000+ 26 26 1.0000000
10 10 Information and communication 10-99 836 221 0.2643541
11 11 Information and communication 100-999 94 25 0.2659574
12 12 Information and communication 1000+ 10 10 1.0000000
13 13 Financial and insurance 10-99 60 16 0.2666667
14 14 Financial and insurance 100-999 44 12 0.2727273
15 15 Financial and insurance 1000+ 9 2 0.2222222
16 16 Real estate 10-99 206 55 0.2669903
17 17 Real estate 100-999 16 4 0.2500000
18 18 Other business services 10-99 2033 538 0.2646335
19 19 Other business services 100-999 153 40 0.2614379
20 20 Other business services 1000+ 12 12 1.0000000
Now it is easy to select the sample:
library (sampling)
set.seed (26743583)
pop <- pop[order(pop$stratum),]
s <- sampling::strata (pop,
stratanames="stratum",
size=Strata$nh,
method="srswor")
Finally extract the sample and save it (together with population and stratum level data)
MySample <- getdata (pop, s)
save (MySample, Strata, pop, file="data/41_sampling.Rdata")
Load the package and data
library(validate)
data(mtcars)
Introduce errors
mtcars$am <- sample(0:2, nrow(mtcars), replace=T)
mtcars$mpg <- with(mtcars, mpg * sample(c(-1,1)), nrow(mtcars), replace =T)
Confront the data with rules
cf <- check_that(mtcars, mpg > 0,
cyl %% 1 == 0, am %in% c(0,1))
Plot the results
barplot(cf, main="Errors in the mtcars dataset")
Using aggregate
library(magrittr)
cf %>% aggregate(by = "record") %>% head
npass nfail nNA rel.pass rel.fail rel.NA
1 2 1 0 0.6666667 0.3333333 0
2 3 0 0 1.0000000 0.0000000 0
3 1 2 0 0.3333333 0.6666667 0
4 3 0 0 1.0000000 0.0000000 0
5 2 1 0 0.6666667 0.3333333 0
6 3 0 0 1.0000000 0.0000000 0
Predefine rules
v1 <- validator(am >= 0, cyl < 5)
confront(mtcars, v1)
Object of class 'validation'
Call:
confront(x = mtcars, dat = v1)
Confrontations: 2
With fails : 1
Warnings : 0
Errors : 0
Create rules with YAML
writeLines(readLines("data/rules.yaml"))
rules :
-
expr : am %in% c(0,1)
name : gearModes
description : Either automatic or not
-
expr : mpg > 10
name : minimumMPG
description : EU rules dictate minimum mpg
---
validator(.file = "data/rules.yaml") %>% confront(mtcars, .)
Object of class 'validation'
Call:
confront(x = mtcars, dat = .)
Confrontations: 2
With fails : 2
Warnings : 0
Errors : 0
Some errors are believed to be systematic (ie. we think we know the proces that caused the error). Once the mechanism is know the solution is obvious.
Some of these errors can be detected and corrected with R.
Frequently seen systematic errors
dat <- data.frame(
Turnover = c(1000, 353),
Cost = c(80, 283),
Profit = c(200, 115))
dat
Turnover Cost Profit
1 1000 80 200
2 353 283 115
The first observation is obviously erroneous, and there is something wrong with the second as well
Edits can be hard coded
library (editrules)
E <- editmatrix("Turnover - Cost == Profit")
E
Edit matrix:
Cost Profit Turnover Ops CONSTANT
num1 -1 -1 1 == 0
Edit rules:
num1 : Turnover == Cost + Profit
Load package deducorrect and use function correctTypos()
library (deducorrect)
res <- correctTypos (E, dat)
Result can be viewed like this
res$corrected
Turnover Cost Profit
1 1000 800 200
2 353 238 115
dat
Turnover Cost Profit
1 1000 80 200
2 353 283 115
res$corrections
row variable old new
1 1 Cost 80 800
2 2 Cost 283 238
The first correction was pretty forward - the second was harder to detect…
Here we have four different business areas with corresponding revenue and costs:
dat <- data.frame(
x0 = c(195, 610),
x1.r = c(2100, 5100),
x1.c = c(1950, 4650),
x1 = c(150, 450),
x2.r = c(0, 0),
x2.c = c(10, 130),
x2 = c(10, 130),
x3.r = c(20, 20),
x3.c = c(5, 0),
x3 = c(15, 20),
x4.r = c(50, 15),
x4.c = c(10, 25),
x4 = c(40, 10))
dat
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 10 20 5 15 50 10 40
2 610 5100 4650 450 0 130 130 20 0 20 15 25 10
The rules are pretty straight forward
E <- editmatrix(c("x0 == x1 + x2 + x3 + x4",
"x1 == x1.r - x1.c",
"x2 == x2.r - x2.c",
"x3 == x3.r - x3.c",
"x4 == x4.r - x4.c"))
Confronting the data reveals a number of errors:
E
Edit matrix:
x0 x1 x2 x3 x4 x1.c x1.r x2.c x2.r x3.c x3.r x4.c x4.r Ops CONSTANT
num1 1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 == 0
num2 0 1 0 0 0 1 -1 0 0 0 0 0 0 == 0
num3 0 0 1 0 0 0 0 1 -1 0 0 0 0 == 0
num4 0 0 0 1 0 0 0 0 0 1 -1 0 0 == 0
num5 0 0 0 0 1 0 0 0 0 0 0 1 -1 == 0
Edit rules:
num1 : x0 == x1 + x2 + x3 + x4
num2 : x1 + x1.c == x1.r
num3 : x2 + x2.c == x2.r
num4 : x3 + x3.c == x3.r
num5 : x4 + x4.c == x4.r
(violatedEdits(E, dat))
edit
record num1 num2 num3 num4 num5
1 TRUE FALSE TRUE FALSE FALSE
2 FALSE FALSE TRUE FALSE TRUE
There is even a plot method for violated edits
plot (violatedEdits(E, dat))
Corrections of sign errors by “flipping” sign
dat
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 10 20 5 15 50 10 40
2 610 5100 4650 450 0 130 130 20 0 20 15 25 10
150+10+15+40
[1] 215
res1 <- correctSigns (E, dat)
res1$corrected
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 -10 20 5 15 50 10 40
2 610 5100 4650 450 0 -130 130 20 0 20 -15 -25 10
res1$corrections
row variable old new
1 1 x2 10 -10
2 2 x2.c 130 -130
3 2 x4.c 25 -25
4 2 x4.r 15 -15
Were the correctiosn correct?
dat
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 10 20 5 15 50 10 40
2 610 5100 4650 450 0 130 130 20 0 20 15 25 10
res1$corrected
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 -10 20 5 15 50 10 40
2 610 5100 4650 450 0 -130 130 20 0 20 -15 -25 10
Looking at the numbers indicates, that in row 2 x4.c and x4.r probably were not two sign errors but one error with swapped fields.
We create a list with varibles that are allowed to be swapped
dat
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 10 20 5 15 50 10 40
2 610 5100 4650 450 0 130 130 20 0 20 15 25 10
PairList = list (c("x1.r","x1.c"), c("x2.r","x2.c"), c("x3.r","x3.c"), c("x4.r","x4.c"))
We now allow this type of error to be detected
res2 <- correctSigns (E, dat, swap = PairList)
res2$corrected
x0 x1.r x1.c x1 x2.r x2.c x2 x3.r x3.c x3 x4.r x4.c x4
1 195 2100 1950 150 0 10 -10 20 5 15 50 10 40
2 610 5100 4650 450 0 -130 130 20 0 20 25 15 10
res2$corrections
row variable old new
1 1 x2 10 -10
2 2 x2.c 130 -130
3 2 x4.c 25 15
4 2 x4.r 15 25
res1$status
status weight degeneracy nflip nswap
1 corrected 1 1 1 0
2 corrected 3 1 3 0
res2$status
status weight degeneracy nflip nswap
1 corrected 1 1 1 0
2 corrected 2 2 1 1
Read the data again
load(file="data/41_sampling.RData")
By magic attach survey response (with 100% response rate!)
load(file="data/resp.RData")
sampresp <- merge (MySample, resp, by="id", all.x = TRUE, all.y = FALSE)
sampresp [,c("ID_unit", "Stratum", "turnover_2015")] <- NULL
sampresp <- merge.data.frame(sampresp, Strata[,c("stratum","Nh")], by="stratum")
head (sampresp)
stratum id industry size employees region Prob
1 1 10000004 Manufacturing 10-99 15 Central DK 0.2644711
2 1 10000006 Manufacturing 10-99 27 North DK 0.2644711
3 1 10006984 Manufacturing 10-99 13 Southern DK 0.2644711
4 1 10015258 Manufacturing 10-99 48 Zealand 0.2644711
5 1 10000015 Manufacturing 10-99 38 Central DK 0.2644711
6 1 10000026 Manufacturing 10-99 26 Central DK 0.2644711
turnover_2016 Nh
1 14223 3006
2 35050 3006
3 6729 3006
4 73836 3006
5 46489 3006
6 272710 3006
Load package and create a survey object
library (survey)
stsi.design <- svydesign(id=~1,
strata=~stratum,
fpc=~Nh,
data=sampresp)
Estimation of survey total
(ty_hat <- svytotal(~turnover_2016, stsi.design))
total SE
turnover_2016 1849924516 21804593
Compare with true sum in population
(ty <- sum(resp$turnover_2016)/1e6)
[1] 1883.121
Coefficient of variation (in percent)
cv(ty_hat)*100
turnover_2016
turnover_2016 1.178675
… or 95% pct confidence interval in billion dkk (equals +/- 2 times standard errors)
confint(ty_hat)/1e6
2.5 % 97.5 %
turnover_2016 1807.188 1892.661
… or +/- 1 times estimated standard error (68% confidence interval)
confint(ty_hat, level=0.68)/1e6
16 % 84 %
turnover_2016 1828.241 1871.608
Domain estimate by region
ty_region_hat <- svyby(~turnover_2016, ~region, stsi.design, svytotal)
ty_region_hat$cv <- ty_region_hat$se / ty_region_hat$turnover_2016 * 100
ty_region_hat
region turnover_2016 se cv
North DK North DK 110341169 5288767 4.793104
Central DK Central DK 407435506 7796006 1.913433
Southern DK Southern DK 347020018 11325539 3.263656
Capital Capital 884623135 20275801 2.292027
Zealand Zealand 100504688 4845114 4.820784
Sum of domain estimates consistent with total
sum(ty_region_hat$turnover_2016)
[1] 1849924516
ty_hat
total SE
turnover_2016 1849924516 21804593
Purpose of seasonal adjustment is to facilitate month-to-month comparison (all else being equal).
We do that by decomposing an observed time series into unobserved components using either an additive model \[ X_t = T_t + S_t + I_t \] or a multiplicative model \[ X_t = T_t \cdot S_t \cdot I_t \] where
Read the data
load(file="data/unemp.RData")
str (unemp)
Time-Series [1:120] from 2007 to 2017: 130100 126542 121465 110681 102920 ...
frequency(unemp)
[1] 12
plot(unemp, col="blue", ylab="Gross unemployment [no. persons]", main="Monthly gross unemployment")
Reshape ts object as data frame and add columns for year and month
library(lubridate)
library (zoo)
X <- data.frame(unemp=as.matrix(unemp), date=as.Date(as.yearmon(time(unemp))))
X$year <- as.ordered(year(X$date))
X$month <- as.ordered(month(X$date))
head (X)
unemp date year month
1 130100 2007-01-01 2007 1
2 126542 2007-02-01 2007 2
3 121465 2007-03-01 2007 3
4 110681 2007-04-01 2007 4
5 102920 2007-05-01 2007 5
6 98808 2007-06-01 2007 6
library (ggplot2)
ggplot(data=X, aes(x=month, y=unemp, group=year, colour=year)) +
geom_line() +
geom_point()
There are many possibilities in base R, like decompose() and stl() which are both from the stats package
plot (stats::decompose(unemp,"multiplicative"))
plot (stl (unemp, "periodic"), main="Decomposition using function 'stl'")
tbats from the forecast package is somewhat more complicated (exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components…)
library (forecast)
plot (tbats (unemp))
Seasonal adjust using the forecast::seasadj function:
If more control or options are needed, we turn to X-13ARIMA-SEATS (or its predecessor X-12-ARIMA).
We need the ablity to add regressors to account for
The program X-13ARIMA-SEATS is from the Census Bureau (“Statistics US”). It's a single executable that works through input and output of text files.
The package x12 from Statistics Austria has many functionalities for using X-13ARIMA-SEATS from within R
library (x12)
First create an x12 object
unemp_x12s <- new ("x12Single", ts=unemp, tsName="Gross_unemployment")
The binaries must be installed on machine
x12path(file.path("c:/data/dst/x13as","x13as.exe"))
Then it is simple to run with default settings
s <- x12 (unemp_x12s)
We can get a lot of information about the seasonal adjustment model using the built-in summary function:
summary(s)
-------------------------- Gross_unemployment ------------------------------------
-----------------------------------------------------------------------------------
Time Series
Frequency: 12
Span: 1st month,2007 to 12th month,2016
Model Definition
ARIMA Model: (1 1 0)(0 1 1) (Automatic Model Choice)
Model Span:
Transformation: Automatic selection : No transformation
Regression Model: none
Outlier Detection
No outlier detection performed
Seasonal Adjustment
Identifiable Seasonality: yes
Seasonal Peaks: none
Trading Day Peaks: none
Overall Index of Quality of SA
(Acceptance Region from 0 to 1)
Q: 0.27
Number of M statistics outside the limits: 0
SA decomposition: additive
Moving average used to estimate the seasonal factors: 3x3
Moving average used to estimate the final trend-cycle: 9-term Henderson filter
The various components of the decomposed series are contained in the x12-object and are easily retrieved:
plot(s@x12Output@a1, col="blue", ylab="Gross unemployment [no. persons]")
lines(s@x12Output@d11, col="red")
legend ("bottomright", c("Original", "Seasonally adjusted"),
col=c("blue", "red"), lty=c(1,1))
A spectral plot can be obtained using the built-in plotSpec function:
plotSpec(s)
Similarly, the seasonal factors can be plotted using the built-in plotSeasFac function:
plotSeasFac(s)
And finally, an ACF-plot is obtained using the plotRsdAcf function:
plotRsdAcf(s)
##Questions are welcome!